Logstash入门

Logstash入门

Logstash简介

  • 集中、转换和存储数据

    Logstash 是开源的服务器端数据处理管道,能够同时从多个来源采集数据转换数据,然后将数据发送到您最喜欢的“存储库”中。

  • Data Shipper (数据传送者)

  • ETL

    • Extract (采集数据)
    • Transform (转换数据)
    • Load (传输数据)

Logstash处理流程

  • Input # 输入,其中输入源如下
    • file
    • Redis
    • beats
    • kafka
    • ……
  • Filter # 处理,处理方式如下
    • grok
    • mutate
    • drop
    • date
  • Output # 输出,输出源如下
    • stdout
    • Elasticsearch
    • ……

Input 和 Output配置

例:

# 输入配置
input {file { path => "/tmp/abc.log"}}

# 输出配置
output { stdout {codec => rubydebug}}

Filter配置

  • Grok
    • 基于正则表达式提供了丰富可重用的模式(pattern)
    • 基于此可以将非结构化数据作为结构化处理
  • Date
    • 将字符串类型的时间字段转换为时间戳类型, 方便后续数据处理
  • Mutate
    • 进行增加, 修改, 删除, 替换等字段相关的处理
  • …….

例:

# 待转换的数据
# 55.3.244.1 GET /index.html 15824 0.043

# 转换配置
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

# 转换后的数据
{
    "client": "55.3.244.1",
    "method": "GET",
    "request": "/index.html",
    "bytes": "15824",
    "duration": "0.043"
}

收集nginx log

  1. 下载Logstash

    下载地址: https://www.elastic.co/cn

  2. 运行Logstash

    解压

    [jlc@localhost es]$ tar -zxf logstash-6.1.1.tar.gz

    Logstash配置文件

    # Logstash从标准输入读取数据
    input {
      stdin { }
    }
    
    # 数据处理流程
    filter {
    
      # 将非结构化数据作为结构化处理
      grok {
        match => {
          "message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] "%{WORD:request_action} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:response} %{NUMBER:bytes} "%{DATA:referrer}" "%{DATA:agent}"'
        }
      }
    
      # 将时间字符串进行处理,转换为此条信息的时间戳
      date {
        match => [ "time", "dd/MMM/YYYY:HH:mm:ss Z" ]
        locale => en
      }
    
      # 将IP信息转换为地理位置信息
      geoip {
        source => "remote_ip"
        target => "geoip"
      }
    
      # useragent处理
      useragent {
        source => "agent"
        target => "user_agent"
      }
    }
    
    # 数据输出到标准输出,并按rubydebug进行解码(格式化)
    output {
      stdout {
        codec => rubydebug 
      }
    }

    处理数据示例

    # [jlc@localhost logstash-6.1.1]$ head -n 2 ../access.log
    127.0.0.1 - - [27/Jan/2020:23:15:54 +0800] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
    127.0.0.1 - - [27/Jan/2020:23:38:36 +0800] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"

    运行Logstash

    [jlc@localhost logstash-6.1.1]$ head -n 2 ../access.log | bin/logstash -f nginx_logstash.conf
  3. 标准输出数据

    Sending Logstash's logs to /usr/local/es/logstash-6.1.1/logs which is now configured via log4j2.properties
    [2020-04-03T01:09:01,830][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/local/es/logstash-6.1.1/modules/fb_apache/configuration"}
    [2020-04-03T01:09:01,868][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/local/es/logstash-6.1.1/modules/netflow/configuration"}
    [2020-04-03T01:09:02,084][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/usr/local/es/logstash-6.1.1/data/queue"}
    [2020-04-03T01:09:02,091][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/local/es/logstash-6.1.1/data/dead_letter_queue"}
    [2020-04-03T01:09:02,797][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
    [2020-04-03T01:09:02,968][INFO ][logstash.agent           ] No persistent UUID file found. Generating new UUID {:uuid=>"5de9619e-a0ef-4ba6-a53d-0a966ee65eb9", :path=>"/usr/local/es/logstash-6.1.1/data/uuid"}
    [2020-04-03T01:09:04,137][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.1.1"}
    [2020-04-03T01:09:05,219][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
    [2020-04-03T01:09:10,505][INFO ][logstash.filters.geoip   ] Using geoip database {:path=>"/usr/local/es/logstash-6.1.1/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.2-java/vendor/GeoLite2-City.mmdb"}
    [2020-04-03T01:09:10,991][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250, :thread=>"#"}
    [2020-04-03T01:09:11,143][INFO ][logstash.pipeline        ] Pipeline started {"pipeline.id"=>"main"}
    [2020-04-03T01:09:11,419][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
    {
                 "geoip" => {},
               "request" => "/",
          "http_version" => "1.1",
              "@version" => "1",
                 "bytes" => "612",
                  "tags" => [
            [0] "_geoip_lookup_failure"
        ],
            "user_agent" => {
                 "os" => "Windows 10",
             "device" => "Other",
            "os_name" => "Windows 10",
              "build" => "",
              "minor" => "0",
               "name" => "Chrome",
              "major" => "78",
              "patch" => "3904"
        },
               "message" => "127.0.0.1 - - [27/Jan/2020:23:15:54 +0800] \"GET / HTTP/1.1\" 200 612 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36\"\r",
        "request_action" => "GET",
                  "time" => "27/Jan/2020:23:15:54 +0800",
            "@timestamp" => 2020-01-27T15:15:54.000Z,
                 "agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36",
                  "host" => "localhost.localdomain",
             "user_name" => "-",
              "response" => "200",
              "referrer" => "-",
             "remote_ip" => "127.0.0.1"
    }
    {
                 "geoip" => {},
               "request" => "/",
          "http_version" => "1.1",
              "@version" => "1",
                 "bytes" => "0",
                  "tags" => [
            [0] "_geoip_lookup_failure"
        ],
            "user_agent" => {
                 "os" => "Windows 10",
             "device" => "Other",
            "os_name" => "Windows 10",
              "build" => "",
              "minor" => "0",
               "name" => "Chrome",
              "major" => "78",
              "patch" => "3904"
        },
               "message" => "127.0.0.1 - - [27/Jan/2020:23:38:36 +0800] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36\"\r",
        "request_action" => "GET",
                  "time" => "27/Jan/2020:23:38:36 +0800",
            "@timestamp" => 2020-01-27T15:38:36.000Z,
                 "agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36",
                  "host" => "localhost.localdomain",
             "user_name" => "-",
              "response" => "304",
              "referrer" => "-",
             "remote_ip" => "127.0.0.1"
    }
    [2020-04-03T01:09:12,993][INFO ][logstash.pipeline        ] Pipeline terminated {"pipeline.id"=>"main"}

    注: 其中_geoip_lookup_failure

    此错误是由于,本机IP地址为私网IP, 无法转换为地理位置。


   转载规则


《Logstash入门》 Jiavg 采用 知识共享署名 4.0 国际许可协议 进行许可。
  目录